195 research outputs found

    PeroxiBase: a database with new tools for peroxidase family classification

    Get PDF
    Peroxidases (EC 1.11.1.x), which are encoded by small or large multigenic families, are involved in several important physiological and developmental processes. They use various peroxides as electron acceptors to catalyse a number of oxidative reactions and are present in almost all living organisms. We have created a peroxidase database (http://peroxibase.isb-sib.ch) that contains all identified peroxidase-encoding sequences (about 6000 sequences in 940 organisms). They are distributed between 11 superfamilies and about 60 subfamilies. All the sequences have been individually annotated and checked. PeroxiBase can be consulted using six major interlink sections ‘Classes’, ‘Organisms’, ‘Cellular localisations’, ‘Inducers’, ‘Repressors’ and ‘Tissue types’. General documentation on peroxidases and PeroxiBase is accessible in the ‘Documents’ section containing ‘Introduction’, ‘Class description’, ‘Publications’ and ‘Links’. In addition to the database, we have developed a tool to classify peroxidases based on the PROSITE profile methodology. To improve their specificity and to prevent overlaps between closely related subfamilies the profiles were built using a new strategy based on the silencing of residues. This new profile construction method and its discriminatory capacity have been tested and validated using the different peroxidase families and subfamilies present in the database. The peroxidase classification tool called PeroxiScan is accessible at the following address: http://peroxibase.isb-sib.ch/peroxiscan.php

    E1DS: catalytic site prediction based on 1D signatures of concurrent conservation

    Get PDF
    Large-scale automatic annotation of protein sequences remains challenging in postgenomics era. E1DS is designed for annotating enzyme sequences based on a repository of 1D signatures. The employed sequence signatures are derived using a novel pattern mining approach that discovers long motifs consisted of several sequential blocks (conserved segments). Each of the sequential blocks is considerably conserved among the protein members of an EC group. Moreover, a signature includes at least three sequential blocks that are concurrently conserved, i.e. frequently observed together in sequences. In other words, a sequence signature is consisted of residues from multiple regions of the protein sequence, which echoes the observation that an enzyme catalytic site is usually constituted of residues that are largely separated in the sequence. E1DS currently contains 5421 sequence signatures that in total cover 932 4-digital EC numbers. E1DS is evaluated based on a collection of enzymes with catalytic sites annotated in Catalytic Site Atlas. When compared to the famous pattern database PROSITE, predictions based on E1DS signatures are considered more sensitive in identifying catalytic sites and the involved residues. E1DS is available at http://e1ds.ee.ncku.edu.tw/ and a mirror site can be found at http://e1ds.csbb.ntu.edu.tw/

    seeMotif: exploring and visualizing sequence motifs in 3D structures

    Get PDF
    Sequence motifs are important in the study of molecular biology. Motif discovery tools efficiently deliver many function related signatures of proteins and largely facilitate sequence annotation. As increasing numbers of motifs are detected experimentally or predicted computationally, characterizing the functional roles of motifs and identifying the potential synergetic relationships between them are important next steps. A good way to investigate novel motifs is to utilize the abundant 3D structures that have also been accumulated at an astounding rate in recent years. This article reports the development of the web service seeMotif, which provides users with an interactive interface for visualizing sequence motifs on protein structures from the Protein Data Bank (PDB). Researchers can quickly see the locations and conformation of multiple motifs among a number of related structures simultaneously. Considering the fact that PDB sequences are usually shorter than those in sequence databases and/or may have missing residues, seeMotif has two complementary approaches for selecting structures and mapping motifs to protein chains in structures. As more and more structures belonging to previously uncharacterized protein families become available, combining sequence and structure information gives good opportunities to facilitate understanding of protein functions in large-scale genome projects. Available at: http://seemotif.csie.ntu.edu.tw,http://seemotif.ee.ncku.edu.tw or http://seemotif.csbb.ntu.edu.tw

    Visualising biological data: a semantic approach to tool and database integration

    Get PDF
    <p>Abstract</p> <p>Motivation</p> <p>In the biological sciences, the need to analyse vast amounts of information has become commonplace. Such large-scale analyses often involve drawing together data from a variety of different databases, held remotely on the internet or locally on in-house servers. Supporting these tasks are <it>ad hoc </it>collections of data-manipulation tools, scripting languages and visualisation software, which are often combined in arcane ways to create cumbersome systems that have been customised for a particular purpose, and are consequently not readily adaptable to other uses. For many day-to-day bioinformatics tasks, the sizes of current databases, and the scale of the analyses necessary, now demand increasing levels of automation; nevertheless, the unique experience and intuition of human researchers is still required to interpret the end results in any meaningful biological way. Putting humans in the loop requires tools to support real-time interaction with these vast and complex data-sets. Numerous tools do exist for this purpose, but many do not have optimal interfaces, most are effectively isolated from other tools and databases owing to incompatible data formats, and many have limited real-time performance when applied to realistically large data-sets: much of the user's cognitive capacity is therefore focused on controlling the software and manipulating esoteric file formats rather than on performing the research.</p> <p>Methods</p> <p>To confront these issues, harnessing expertise in human-computer interaction (HCI), high-performance rendering and distributed systems, and guided by bioinformaticians and end-user biologists, we are building reusable software components that, together, create a toolkit that is both architecturally sound from a computing point of view, and addresses both user and developer requirements. Key to the system's usability is its direct exploitation of semantics, which, crucially, gives individual components knowledge of their own functionality and allows them to interoperate seamlessly, removing many of the existing barriers and bottlenecks from standard bioinformatics tasks.</p> <p>Results</p> <p>The toolkit, named Utopia, is freely available from <url>http://utopia.cs.man.ac.uk/</url>.</p

    The Pseudouridine Synthase RPUSD4 Is an Essential Component of Mitochondrial RNA Granules

    Get PDF
    Mitochondrial gene expression is a fundamental process that is largely dependent on nuclear-encoded proteins. Several steps of mitochondrial RNA processing and maturation, including RNA post-transcriptional modification, appear to be spatially organized into distinct foci, which we have previously termed mitochondrial RNA granules (MRGs). Although an increasing number of proteins have been localized to MRGs, a comprehensive analysis of the proteome of these structures is still lacking. Here, we have applied a microscopy-based approach that has allowed us to identify novel components of the MRG proteome. Among these, we have focused our attention on RPUSD4, an uncharacterized mitochondrial putative pseudouridine synthase. We show that RPUSD4 depletion leads to a severe reduction of the steady-state level of the 16S mitochondrial (mt) rRNA with defects in the biogenesis of the mitoribosome large subunit and consequently in mitochondrial translation. We report that RPUSD4 binds 16S mt-rRNA, mt-tRNA(Met), and mt-tRNA(Phe), and we demonstrate that it is responsible for pseudouridylation of the latter. These data provide new insights into the relevance of RNA pseudouridylation in mitochondrial gene expression.This work was supported by the Swiss National Science Foundation Grant SNF310030B_160257/1, Wellcome Trust Grant 096919/Z/11/Z, and core funding of Medical Research Council, UK

    Combinatorial Clustering of Residue Position Subsets Predicts Inhibitor Affinity across the Human Kinome

    Get PDF
    The protein kinases are a large family of enzymes that play fundamental roles in propagating signals within the cell. Because of the high degree of binding site similarity shared among protein kinases, designing drug compounds with high specificity among the kinases has proven difficult. However, computational approaches to comparing the 3-dimensional geometry and physicochemical properties of key binding site residue positions have been shown to be informative of inhibitor selectivity. The Combinatorial Clustering Of Residue Position Subsets (CCORPS) method, introduced here, provides a semi-supervised learning approach for identifying structural features that are correlated with a given set of annotation labels. Here, CCORPS is applied to the problem of identifying structural features of the kinase ATP binding site that are informative of inhibitor binding. CCORPS is demonstrated to make perfect or near-perfect predictions for the binding affinity profile of 8 of the 38 kinase inhibitors studied, while only having overall poor predictive ability for 1 of the 38 compounds. Additionally, CCORPS is shown to identify shared structural features across phylogenetically diverse groups of kinases that are correlated with binding affinity for particular inhibitors; such instances of structural similarity among phylogenetically diverse kinases are also shown to not be rare among kinases. Finally, these function-specific structural features may serve as potential starting points for the development of highly specific kinase inhibitors

    An integrated ontology resource to explore and study host-virus relationships.

    Get PDF
    Our growing knowledge of viruses reveals how these pathogens manage to evade innate host defenses. A global scheme emerges in which many viruses usurp key cellular defense mechanisms and often inhibit the same components of antiviral signaling. To accurately describe these processes, we have generated a comprehensive dictionary for eukaryotic host-virus interactions. This controlled vocabulary has been detailed in 57 ViralZone resource web pages which contain a global description of all molecular processes. In order to annotate viral gene products with this vocabulary, an ontology has been built in a hierarchy of UniProt Knowledgebase (UniProtKB) keyword terms and corresponding Gene Ontology (GO) terms have been developed in parallel. The results are 65 UniProtKB keywords related to 57 GO terms, which have been used in 14,390 manual annotations; 908,723 automatic annotations and propagated to an estimation of 922,941 GO annotations. ViralZone pages, UniProtKB keywords and GO terms provide complementary tools to users, and the three resources have been linked to each other through host-virus vocabulary

    Collaborative annotation of genes and proteins between UniProtKB/Swiss-Prot and dictyBase

    Get PDF
    UniProtKB/Swiss-Prot, a curated protein database, and dictyBase, the Model Organism Database for Dictyostelium discoideum, have established a collaboration to improve data sharing. One of the major steps in this effort was the ‘Dicty annotation marathon’, a week-long exercise with 30 annotators aimed at achieving a major increase in the number of D. discoideum proteins represented in UniProtKB/Swiss-Prot. The marathon led to the annotation of over 1000 D. discoideum proteins in UniProtKB/Swiss-Prot. Concomitantly, there were a large number of updates in dictyBase concerning gene symbols, protein names and gene models. This exercise demonstrates how UniProtKB/Swiss-Prot can work in very close cooperation with model organism databases and how the annotation of proteins can be accelerated through those collaborations

    ComPath: comparative enzyme analysis and annotation in pathway/subsystem contexts

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Once a new genome is sequenced, one of the important questions is to determine the presence and absence of biological pathways. Analysis of biological pathways in a genome is a complicated task since a number of biological entities are involved in pathways and biological pathways in different organisms are not identical. Computational pathway identification and analysis thus involves a number of computational tools and databases and typically done in comparison with pathways in other organisms. This computational requirement is much beyond the capability of biologists, so information systems for reconstructing, annotating, and analyzing biological pathways are much needed. We introduce a new comparative pathway analysis workbench, ComPath, which integrates various resources and computational tools using an interactive spreadsheet-style web interface for reliable pathway analyses.</p> <p>Results</p> <p>ComPath allows users to compare biological pathways in multiple genomes using a spreadsheet style web interface where various sequence-based analysis can be performed either to compare enzymes (e.g. sequence clustering) and pathways (e.g. pathway hole identification), to search a genome for <it>de novo </it>prediction of enzymes, or to annotate a genome in comparison with reference genomes of choice. To fill in pathway holes or make <it>de novo </it>enzyme predictions, multiple computational methods such as FASTA, Whole-HMM, CSR-HMM (a method of our own introduced in this paper), and PDB-domain search are integrated in ComPath. Our experiments show that FASTA and CSR-HMM search methods generally outperform Whole-HMM and PDB-domain search methods in terms of sensitivity, but FASTA search performs poorly in terms of specificity, detecting more false positive as E-value cutoff increases. Overall, CSR-HMM search method performs best in terms of both sensitivity and specificity. Gene neighborhood and pathway neighborhood (global network) visualization tools can be used to get context information that is complementary to conventional KEGG map representation.</p> <p>Conclusion</p> <p>ComPath is an interactive workbench for pathway reconstruction, annotation, and analysis where experts can perform various sequence, domain, context analysis, using an intuitive and interactive spreadsheet-style interface. </p
    corecore